Model Selection

KV cache optimization

# KV cache optimization

Meta Llama 3 8B Instruct FP8 KV

The Meta-Llama-3-8B-Instruct model has undergone per-tensor quantization of FP8 weights and activations, suitable for inference with vLLM >= 0.5.0. This model checkpoint also includes per-tensor scaling parameters for FP8 quantized KV cache.

Large Language Model

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase